Comments for MEDB 5502, Module 08, survival analysis

Topics to be covered

  • What you will learn
    • A simple example of survival data
    • Overall Kaplan-Meier curve
    • The log rank test
    • The hazard function
    • The Cox regression model
    • Assumptions and data management

Survival analysis

  • Time to event models
    • Death
    • Relapse
    • Rehospitalization
    • Failure of medical device
    • Pregnancy
  • Not every patient experiences the event
    • These are censored observations

First fruit fly experiment, 1 of 4

data_dictionary: fly1.txt
description: |
  This dataset provides a simple example of what survival and censoring. It provides an inuitive explanation of estimation of survival probabilities.
vars:
  day:
    label: Time until death
    unit: days

First fruit fly experiment, 2 of 4

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, 70, 71, 72, 73, 75, 77, 79, 89, 94, 96

First fruit fly experiment, 3 of 4

  day   p
1  37 96%
2  40 92%
3  43 88%
4  44 84%
5  45 80%
6  47 76%
7  49 72%
8  54 68%
9  56 64%
   day   p
10  58 60%
11  59 56%
12  60 52%
13  61 48%
14  62 44%
15  68 40%
16  70 36%
17  71 32%
18  72 28%
   day   p
19  73 24%
20  75 20%
21  77 16%
22  79 12%
23  89  8%
24  94  4%
25  96  0%

First fruit fly experiment, 4 of 4

Second fruit fly experiment, 1 of 4

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, ??, ??, ??, ??, ??, ??, ??, ??, ??, ??

Second fruit fly experiment, 2 of 4

  day event
1  37     1
2  40     1
3  43     1
4  44     1
5  45     1
6  47     1
7  49     1
8  54     1
9  56     1
   day event
10  58     1
11  59     1
12  60     1
13  61     1
14  62     1
15  68     1
16  70     0
17  70     0
18  70     0
   day event
19  70     0
20  70     0
21  70     0
22  70     0
23  70     0
24  70     0
25  70     0

Second fruit fly experiment, 3 of 4

  day event   p
1  37     1 96%
2  40     1 92%
3  43     1 88%
4  44     1 84%
5  45     1 80%
6  47     1 76%
7  49     1 72%
8  54     1 68%
9  56     1 64%
   day event   p
10  58     1 60%
11  59     1 56%
12  60     1 52%
13  61     1 48%
14  62     1 44%
15  68     1 40%
16  70     0    
17  70     0    
18  70     0    
   day event p
19  70     0  
20  70     0  
21  70     0  
22  70     0  
23  70     0  
24  70     0  
25  70     0  

Second fruit fly experiment, 4 of 4

Third fruit fly experiment, 1 of 4

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, ??, 71, ??, ??, 75, ??, ??, 89, ??, 96

Third fruit fly experiment, 2 of 4

  day event
1  37     1
2  40     1
3  43     1
4  44     1
5  45     1
6  47     1
7  49     1
8  54     1
9  56     1
   day event
10  58     1
11  59     1
12  60     1
13  61     1
14  62     1
15  68     1
16  70     0
17  71     1
18  70     0
   day event
19  70     0
20  75     1
21  70     0
22  70     0
23  89     1
24  70     0
25  96     1

Third fruit fly experiment, 3 of 4

  day event   p
1  37     1 96%
2  40     1 92%
3  43     1 88%
4  44     1 84%
5  45     1 80%
6  47     1 76%
7  49     1 72%
8  54     1 68%
9  56     1 64%
   day event   p
10  58     1 60%
11  59     1 56%
12  60     1 52%
13  61     1 48%
14  62     1 44%
15  68     1 40%
16  70     0    
17  71     1 30%
18  70     0    
   day event   p
19  70     0    
20  75     1 20%
21  70     0    
22  70     0    
23  89     1 10%
24  70     0    
25  96     1  0%

Third fruit fly experiment, 4 of 4

Interpreting Kaplan-Meier plots

Interpreting Kaplan-Meier plots, 2 of 3

Interpreting Kaplan-Meier plots, 3 of 3

Break #1

  • What you have learned
    • A simple example of survival data
  • What’s coming next
    • Overall Kaplan-Meier curve

Worcester Heart Attack Study, 1 of 9

data_dictionary: whas500.dat
description: |
  The data represents survival times for a 500 patient  subset of data from the Worcester Heart Attack Study. You can find more information about this data set in Chapter 1 of Hosmer, Lemeshow, and May.

Worcester Heart Attack Study, 2 of 9

id:
  label: a sequential code from 1 to 100
age:
  label: Age at Admission
  unit: years
gender:
  value:
    Male: 0
    Female: 1

Worcester Heart Attack Study, 3 of 9

hr:
  label: Initial Heart Rate
  unit: Beats per minute
sysbp:
  label: Initial Systolic Blood Pressure
  unit: mmHg
diasbp:
  label: Initial Diastolic Blood Pressure
  unit: mmHg

Worcester Heart Attack Study, 4 of 9

bmi:
  label: Body Mass Index
  unit: kg/m^2
cvd:
  label: History of Cardiovascular Disease
  value:
    'FALSE': 0
    'TRUE': 1
afb:
  label: Atrial Fibrillation
  value:
    'FALSE': 0
    'TRUE': 1

Worcester Heart Attack Study, 5 of 9

sho:
  label: Cardiogenic Shock
  value:
    'FALSE': 0
    'TRUE': 1
chf:
  label: Congestive Heart Complications
  value:
    'FALSE': 0
    'TRUE': 1
av3:
  label: Complete Heart Block
  value:
    'FALSE': 0
    'TRUE': 1

Worcester Heart Attack Study, 6 of 9

miord:
  label: MI Order
  value:
    First: 0
    Recurrent: 1
mitype:
  label: MI Type
  value:
    non Q-wave: 0
    Q-wave: 1

Worcester Heart Attack Study, 7 of 9

year:
  label: Cohort Year
  value:
    yr1997: 1
    yr1999: 2
    yr2001: 3
admitdate:
  label: Admission Date
  format: mm/dd/yyyy
disdate:
  label: Hospital Discharge Date
  format: mm/dd/yyyy

Worcester Heart Attack Study, 8 of 9

fdate:
  label: Date of last Follow Up
los:
  label: Length of Hospital Stay
  unit: Days
dstat:
  label: Discharge Status from Hospital
  value:
    Alive: 0
    Dead: 1

Worcester Heart Attack Study, 9 of 9

lenfol:
  label: Follow Up Time
  unit: days
fstat:
  label: Vital Satus
  value:
    Alive: 0
    Dead: 1

Event count

Overall Kaplan-Meier curve

Live demo, Overall Kaplan-Meier curve

Break #2

  • What you have learned
    • Overall Kaplan-Meier curve
  • What’s coming next
    • The log rank test

Event count by gender

Histogram of ages

Quality check of age group coding

Event count by age group, 1 of 2

Event count by age group, 2 of 2

Kaplan-Meier analysis by gender, 1 of 3

Kaplan-Meier analysis by gender, 2 of 3

Kaplan-Meier analysis by gender, 3 of 3

Kaplan-Meier analysis by age group, 1 of 3

Kaplan-Meier analysis by age group, 2 of 3

Kaplan-Meier analysis by age group, 3 of 3

Live demo, The log rank test

Break #3

  • What you have learned
    • The log rank test
  • What’s coming next
    • The hazard function

Life insurance example

Probabilities for ages 21 through 41

Probabilities for ages 95 through 99

Why are these probabilities not comparable?

  • Unequal time intervals
    • Fix by computing a rate
  • Non-uniform probabilities over the interval
    • Fix by looking at narrow interval
  • No adjustment for survivorship
    • Fix by dividing by survival probabilty

Hazard function, definition

  • \[h(t)=lim_{\Delta t \rightarrow 0}\frac{P[t \le T \le t+\Delta t]/\Delta t}{P[T \ge t]}\]

  • \[h(t)=\frac{f(t)}{S(t)}\]

    • where \(f\) is the density function, and
    • \(S\) is the survival function (\(S(t)=1-F(t)\))

Hazard function, example

Hazard function on a log scale

Break #4

  • What you have learned
    • The hazard function
  • What’s coming next
    • The Cox regression model

Mean ages for men and women

Unadjusted and adjusted Cox regression models for gender

Live demo, The Cox regression model

Break #5

  • What you have learned
    • The Cox regression model
  • What’s coming next
    • Assumptions and data management

Assumptions of the log rank test

  • Independence
    • From one patient to another
    • Of censoring mechanism

Assumptions of the Cox regression model

  • Independence
  • Proportional hazards assumption
  • Possible violations of proportional hazards
    • Survival curves that cross
    • One curve flattening out over time
    • Curves diverge only at later times

Survival curves that cross

One curve flattening out over time

Curves diverge only at later times

Sample size issues

  • Rule of 50
  • Rule of 15

Use ISO format for dates

Understand the internal storage system for dates

Date management

  • The three dates you need
    • the date of origin,
    • the date of the event (if it occurred),
    • the date of last contact with the patient.

The date of origin

  • Rehospitalization
    • use date of first discharge.
  • Failure of a mechanical device
    • use date of implant.
  • Divorce
    • use date of marriage.
  • Loan default
    • use date of loan contract.
  • Infectious disease
    • use date of first exposure.

The data of the event

  • Define your event precisely
    • All-cause mortality
    • Mortality related to the health condition
    • Composite endpoints (e.g., death or relapse)
      • Requires comparing the earlier of two dates.
  • If the event did NOT happen, leave this field blank/missing.

The date of last contact

  • If event did not occur
    • Must be specified
    • Typically last medical exam or last telephone contact
  • If event did occur
    • Make same as event date, or
    • Leave blank

Survival calculations, 1 of 2

  • Time = max(Date of event, Date of last contact) - Date of origin
  • Censoring variable = 0 if Date of event is missing, 1 if not

Survival calculations, 2 of 2

Summary

  • What you have learned
    • A simple example of survival data
    • Overall Kaplan-Meier curve
    • The log rank test
    • The hazard function
    • The Cox regression model
    • Assumptions and data management

Additional topics??